ISO2024 INTRODUCTORY SPATIAL 'OMICS ANALYSIS
- HYBRID : TORONTO & ZOOM
- 9TH JULY 2024
**Module 3 : Building your spatial model **
Instructor : Shamini Ayyadhury
TOPICS COVERED
- A. Normalizing your data *
- B. Spatial and non-spatial clustering *
B. SPATIAL & NON-SPATIAL CLUSTERING
### import the following libraries. Some of these were used in the previous notebook and we are using them here as well
import sys # system specific parameters and functions
import pandas as pd # data manipulation and analysis
import numpy as np # numerical computing library
import matplotlib.pyplot as plt # plotting library
import seaborn as sns # data visualization library based on matplotlib
import scanpy as sc # single-cell analysis in Python
import os # operating system dependent functionality
sys.path.append('/home/shamini//data/projects/spatial_workshop/')
sys.path.append('/home/shamini//data/projects/spatial_workshop/Banksy_py')
import pre_processing_fnc as ppf # from here onwards we will only use the function for memory regulation
### directory & filepaths
data_dir = '/home/shamini/data1/data_orig/data/spatial/xenium/10xGenomics/'
out = '/home/shamini/data/projects/spatial_workshop/out/'
os.makedirs(out+'module3/TgCRND8_17_8mths', exist_ok=True) # create a new directory to store the output files
color = ppf.colors()
STOP TO DISCUSS
### IMPORT DATA
'''
ANNDATA OBJECT FROM MODULE 2
We will use the anndata object from module 2 to perform the spatial analysis to ensure a clean workflow and to avoid any discrepancies in the data.
'''
adata = sc.read_h5ad(out+'module2/TgCRND8_17_8mths/adata.h5ad')
cell_label = pd.read_csv(out+'module3/allen_annotations/TgCRND8_allen.csv', index_col=0)
cell_label = cell_label['predicted.id']
# Subset adata to include only the cells present in cell_label
common_indices = adata.obs.index.intersection(cell_label.index)
adata = adata[common_indices].copy()
# Reindex cell_label to match adata_subset
cell_label_subset = cell_label.reindex(adata.obs.index)
adata.obs['cell_label'] = cell_label_subset
adata = adata[~adata.obs['cell_label'].isna()].copy()
print(adata)
AnnData object with n_obs × n_vars = 60099 × 347
obs: 'total_counts', 'neg_counts', 'x_location', 'y_location', 'n_genes', 'n_counts', 'cell_label'
var: 'n_cells'
uns: 'spatial'
obsm: 'spatial'
WE WILL PERFORM A VERY STANDARD WORLFLOW HERE TO UNDERSTAND THE CONCEPT OF DIMENSIONAL REDUCTION, REDUCED VISSUALIZATION & CLUSTER FORMATION WE WILL PERFORM BOTH PCA AND COORDINATE BASED DIM REDUCTION
We will work on the basic code for spatial and non-spatial clustering. After understanding the principles, you will be provided with an exercise using one of the various spatial clustering packages (Banksy)
'''
Take note of these packages.
'''
from sklearn.decomposition import PCA
from sklearn.neighbors import NearestNeighbors
import igraph as ig
import leidenalg as la
import umap
DISCUSSION/LECTURE There are two ways perform your clustering.
- non-spatial clustering
- spatial clustering
We will follow the basic steps of both in the next example just to walk you through the principles of the pipeline before we begin our lecture.
*** We will normalize all datasets using pearson residuals henceforth ***
### We will normalize all data using Pearson residuals henceforth
adata.X = adata.raw.X.copy() # Ensure raw data is dense if needed
sc.experimental.pp.normalize_pearson_residuals(adata)
print('normalized')
normalized
- First we will perform dimensional reduction with PCA before performing all downstream operations
### Perform PCA
pca = PCA(n_components=21)
adata.obsm['sk_pca'] = pca.fit_transform(adata.X)
print('pca done')
### Compute PCA-based neighbors
nbrs = NearestNeighbors(n_neighbors=27, algorithm='ball_tree').fit(adata.obsm['sk_pca'])
distances, indices = nbrs.kneighbors(adata.obsm['sk_pca'])
print('neighbors computed')
### Convert the neighbor information into a graph format suitable for Leiden clustering
edges = [(i, indices[i, j]) for i in range(indices.shape[0]) for j in range(1, indices.shape[1])]
print('edges created')
### Create an igraph from edges
g = ig.Graph(edges=edges)
print('graph created')
### Perform Leiden clustering on the constructed graph
partition = la.find_partition(g, la.RBConfigurationVertexPartition, resolution_parameter=0.3)
print('clustering done')
adata.obs['leiden_pca'] = partition.membership
### Initialize UMAP reducer
reducer = umap.UMAP(n_neighbors=27, min_dist=0.3, metric='euclidean')
### Fit and transform the data using the precomputed graph
adata.obsm['umap_graph_pca'] = reducer.fit_transform(adata.obsm['sk_pca'])
print('umap done')
#----------------------
ppf.get_memory_usage() ### monitor memory usage
pca done neighbors computed edges created graph created clustering done umap done
'Memory usage: 1920.22 MB'
- Extract spatial coordinates and perform leiden clustering directly
### Perform clustering using spatial coordinates
spatial = adata.obs[['x_location', 'y_location']].values
print('spatial coordinates extracted')
nbrs = NearestNeighbors(n_neighbors=50, algorithm='ball_tree').fit(spatial)
distances, indices = nbrs.kneighbors(spatial)
print('neighbors computed directly from spatial coordinates')
### Create edges based on k-nearest neighbors
edges = [(i, indices[i, j]) for i in range(indices.shape[0]) for j in range(1, indices.shape[1])]
print('edges created')
### Create an igraph from edges
g = ig.Graph(edges=edges)
print('graph created')
### Perform Leiden clustering on the constructed graph
partition = la.find_partition(g, la.RBConfigurationVertexPartition, resolution_parameter=0.3)
print('clustering done')
adata.obs['leiden_spatial'] = partition.membership
### Fit and transform the data using the precomputed graph
adata.obsm['umap_graph_spatial'] = reducer.fit_transform(spatial)
print('umap done')
#----------------------
ppf.get_memory_usage() ### monitor memory usage
spatial coordinates extracted neighbors computed directly from spatial coordinates edges created graph created clustering done umap done
'Memory usage: 2184.08 MB'
### plot embeddings for both spatial and non-spatial clustering
# Plotting parameters
# -------------------
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
s = 1
fig.suptitle('B1. Non-spatial and Spatial-based Leiden clustering', fontsize=16, x=0.3, y=1.05)
# -------------------
sns.scatterplot(x=adata.obs['x_location'], y=adata.obs['y_location'], hue=adata.obs['leiden_pca'], s=s, palette=color, ax=axes[0], legend=False)
axes[0].legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
sns.scatterplot(x=adata.obs['x_location'], y=adata.obs['y_location'], hue=adata.obs['leiden_spatial'], s=s, palette=color, ax=axes[1], legend=False)
axes[1].legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
sns.despine()
plt.tight_layout()
/tmp/ipykernel_323931/3967921156.py:14: UserWarning: The palette list has more values (23) than needed (14), which may not be intended. sns.scatterplot(x=adata.obs['x_location'], y=adata.obs['y_location'], hue=adata.obs['leiden_pca'], s=s, palette=color, ax=axes[0], legend=False) /tmp/ipykernel_323931/3967921156.py:15: UserWarning: No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument. axes[0].legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) /tmp/ipykernel_323931/3967921156.py:17: UserWarning: The palette list has more values (23) than needed (16), which may not be intended. sns.scatterplot(x=adata.obs['x_location'], y=adata.obs['y_location'], hue=adata.obs['leiden_spatial'], s=s, palette=color, ax=axes[1], legend=False) /tmp/ipykernel_323931/3967921156.py:18: UserWarning: No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument. axes[1].legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
LET'S STOP AND REVIEW
- What did we just do?
- Which matters more, non-spatial or spatial?
- When choosing a spatial method or developing one, what are the parameters that we need to consider?
BEGIN THE LECTURE PROPER
Below is a script adapted from https://github.com/prabhakarlab/Banksy_py/.
Before running each cell, think about what the alogorithm is working on to incorporate spatial features
TUTORIAL LECTURE
STEP 1 :
'''
1. FIRST WE WILL PERFORM A NEAREST NEIGHBOR BASED DISTANCE CALCULATION TO COMPUTE THE NECESSARY DISTANCES BETWEEN THE CELLS
'''
# set params
# ==========
plot_graph_weights = True
k_geom = 15 # only for fixed type
max_m = 1 # azumithal transform up to kth order
nbr_weight_decay = "scaled_gaussian" # can also be "reciprocal", "uniform" or "ranked"
from Banksy_py.banksy.main import median_dist_to_nearest_neighbour
# Find median distance to closest neighbours, the median distance will be `sigma`
nbrs = median_dist_to_nearest_neighbour(adata, key = 'spatial')
from banksy.initialize_banksy import initialize_banksy
banksy_dict = initialize_banksy(
adata,
('x_location', 'y_location', 'spatial'),
k_geom,
nbr_weight_decay=nbr_weight_decay,
max_m=max_m,
plt_edge_hist=False,
plt_nbr_weights=False,
plt_agf_angles=False, # takes long time to plot
plt_theta=False,
)
banksy_dict
### remove all the warnings and messages from the output
#----------------------
ppf.get_memory_usage() ### monitor memory usage
Median distance to closest cell = 11.978918604010019 ---- Ran median_dist_to_nearest_neighbour in 0.21 s ---- Median distance to closest cell = 11.978918604010019 ---- Ran median_dist_to_nearest_neighbour in 0.21 s ---- ---- Ran generate_spatial_distance_graph in 0.45 s ---- ---- Ran row_normalize in 0.22 s ---- ---- Ran generate_spatial_weights_fixed_nbrs in 2.73 s ---- ---- Ran generate_spatial_distance_graph in 0.68 s ---- ---- Ran theta_from_spatial_graph in 0.61 s ---- ---- Ran row_normalize in 0.23 s ---- ---- Ran generate_spatial_weights_fixed_nbrs in 3.57 s ----
'Memory usage: 2229.33 MB'
TUTORIAL LECTURE
STEP 2 :
'''
2. NEXT WE WILL CONSTRUCT A BANKSY MATRIX
'''
from Banksy_py.banksy.embed_banksy import generate_banksy_matrix
### the following are the main hyperparamters for the banksy algorithm
### ------------------------------------------------------------------
resolutions = [0.3] ### clustering resolution for umap
pca_dims = [18] ### Dimensionality to which to reduce data to
lamda_list = [0, 0.25, 0.50, 0.75, 1.00] ### list of lamda values, setting higher value will result in more domain specific clustering
banksy_dict, banksy_matrix = generate_banksy_matrix(adata, banksy_dict, lamda_list, max_m, verbose=False)
Decay Type: scaled_gaussian
Weights Object: {'weights': {0: <60099x60099 sparse matrix of type '<class 'numpy.float64'>'
with 901485 stored elements in Compressed Sparse Row format>, 1: <60099x60099 sparse matrix of type '<class 'numpy.complex128'>'
with 1802970 stored elements in Compressed Sparse Row format>}}
Nbr matrix | Mean: 0.01 | Std: 0.84
Size of Nbr | Shape: (60099, 347)
Top 3 entries of Nbr Mat:
[[-0.56548718 0.04219709 0.39646856]
[-0.56198685 0.34703652 0.39380354]
[-0.89019994 0.73978649 -0.31772154]]
AGF matrix | Mean: 0.2 | Std: 0.27
Size of AGF mat (m = 1) | Shape: (60099, 347)
Top entries of AGF:
[[0.21517585 0.21879605 0.25946109]
[0.28190915 0.22844802 0.24155257]
[0.15627197 0.25972449 0.11935542]]
Ran 'Create BANKSY Matrix' in 0.15 mins
Scale factors squared: [1. 0. 0.]
Scale factors: [1. 0. 0.]
Scale factors squared: [0.75 0.16666667 0.08333333]
Scale factors: [0.8660254 0.40824829 0.28867513]
Scale factors squared: [0.5 0.33333333 0.16666667]
Scale factors: [0.70710678 0.57735027 0.40824829]
Scale factors squared: [0.25 0.5 0.25]
Scale factors: [0.5 0.70710678 0.5 ]
Scale factors squared: [0. 0.66666667 0.33333333]
Scale factors: [0. 0.81649658 0.57735027]
TUTORIAL LECTURE
STEP 3 :
### append non-spatial results to the banksy_dict for comparison
from banksy.main import concatenate_all
banksy_dict['nonspatial'] = {### here we append the non-spatial matrix (adata.X) to obtain the non-spatial clustering results
0.0: {"adata": concatenate_all([adata.X], 0, adata=adata), }
}
Scale factors squared: [1.] Scale factors: [1.]
'''
3. BANKSY APPLIES PCA AND UMAP OVER THE SPATIAL DERIVED MATRIX, FOLLOWING BY LEIDEN CLUSTERING
'''
from banksy_utils.umap_pca import pca_umap
pca_umap(banksy_dict,
pca_dims = pca_dims,
add_umap = True,
plt_remaining_var = False,
verbose = False)
from banksy.cluster_methods import run_Leiden_partition
seed=329
results_df, max_num_labels = run_Leiden_partition(
banksy_dict,
resolutions,
num_nn = 50,
num_iterations = -1,
partition_seed = seed,
match_labels = True,
verbose = False
)
#----------------------
ppf.get_memory_usage() ### monitor memory usage
Current decay types: ['scaled_gaussian', 'nonspatial'] Reducing dims of dataset in (Index = scaled_gaussian, lambda = 0) ================================================== Setting the total number of PC = 18 Original shape of matrix: (60099, 1041) Reduced shape of matrix: (60099, 18) ------------------------------------------------------------ min_value = -15.255406718247182, mean = 1.64462706540663e-16, max = 26.377681341251005 Conducting UMAP and adding embeddings to adata.obsm["reduced_pc_18_umap"] UMAP embedding ------------------------------------------------------------ shape: (60099, 2) AxisArrays with keys: reduced_pc_18, reduced_pc_18_umap Reducing dims of dataset in (Index = scaled_gaussian, lambda = 0.25) ================================================== Setting the total number of PC = 18 Original shape of matrix: (60099, 1041) Reduced shape of matrix: (60099, 18) ------------------------------------------------------------ min_value = -14.054746119252238, mean = -8.360083585005625e-17, max = 22.36550860738117 Conducting UMAP and adding embeddings to adata.obsm["reduced_pc_18_umap"] UMAP embedding ------------------------------------------------------------ shape: (60099, 2) AxisArrays with keys: reduced_pc_18, reduced_pc_18_umap Reducing dims of dataset in (Index = scaled_gaussian, lambda = 0.5) ================================================== Setting the total number of PC = 18 Original shape of matrix: (60099, 1041) Reduced shape of matrix: (60099, 18) ------------------------------------------------------------ min_value = -15.180002027701372, mean = -3.772809562560678e-17, max = 17.114937711486206 Conducting UMAP and adding embeddings to adata.obsm["reduced_pc_18_umap"] UMAP embedding ------------------------------------------------------------ shape: (60099, 2) AxisArrays with keys: reduced_pc_18, reduced_pc_18_umap Reducing dims of dataset in (Index = scaled_gaussian, lambda = 0.75) ================================================== Setting the total number of PC = 18 Original shape of matrix: (60099, 1041) Reduced shape of matrix: (60099, 18) ------------------------------------------------------------ min_value = -18.4084117915282, mean = -2.2384636123270876e-17, max = 17.177665539372057 Conducting UMAP and adding embeddings to adata.obsm["reduced_pc_18_umap"] UMAP embedding ------------------------------------------------------------ shape: (60099, 2) AxisArrays with keys: reduced_pc_18, reduced_pc_18_umap Reducing dims of dataset in (Index = scaled_gaussian, lambda = 1.0) ================================================== Setting the total number of PC = 18 Original shape of matrix: (60099, 1041) Reduced shape of matrix: (60099, 18) ------------------------------------------------------------ min_value = -17.166236691375495, mean = -3.8962928838979706e-17, max = 21.776413781965168 Conducting UMAP and adding embeddings to adata.obsm["reduced_pc_18_umap"] UMAP embedding ------------------------------------------------------------ shape: (60099, 2) AxisArrays with keys: reduced_pc_18, reduced_pc_18_umap Reducing dims of dataset in (Index = nonspatial, lambda = 0.0) ================================================== Setting the total number of PC = 18 Original shape of matrix: (60099, 347) Reduced shape of matrix: (60099, 18) ------------------------------------------------------------ min_value = -15.255406699686917, mean = 1.6722794474720554e-16, max = 26.377464708199287 Conducting UMAP and adding embeddings to adata.obsm["reduced_pc_18_umap"] UMAP embedding ------------------------------------------------------------ shape: (60099, 2) AxisArrays with keys: reduced_pc_18, reduced_pc_18_umap Decay type: scaled_gaussian Neighbourhood Contribution (Lambda Parameter): 0.25 reduced_pc_18 reduced_pc_18_umap PCA dims to analyse: [18] ==================================================================================================== Setting up partitioner for (nbr decay = scaled_gaussian), Neighbourhood contribution = 0.25, PCA dimensions = 18) ====================================================================================================
/home/shamini//data/projects/spatial_workshop/Banksy_py/banksy/cluster_methods.py:18: UserWarning: No rpy2 installed. BANKSY will run, but mclust will not work. Note: you can still use the default leiden option for clustering. Install rpy2 and R in your conda environment if you want to use mclust for clustering. warnings.warn(warn_str)
---- Ran find_nn in 54.91 s ---- Nearest-neighbour connectivity graph (dtype: int16, shape: (60099, 60099)) has 3004950 nonzero entries. (after computing shared NN) Allowing nearest neighbours only reduced the number of shared NN from 43124657 to 3002492. ---- Ran shared_nn in 3.41 s ---- -- Multiplying sNN connectivity by weights -- shared NN with distance-based weights graph (dtype: float64, shape: (60099, 60099)) has 2920288 nonzero entries. shared NN weighted graph data: [0.17411897 0.17421001 0.17477399 ... 0.25654574 0.26286154 0.2837969 ] Converting graph (dtype: float64, shape: (60099, 60099)) has 2920288 nonzero entries. ---- Ran csr_to_igraph in 1.19 s ---- Resolution: 0.3 ------------------------------ ---- Partitioned BANKSY graph ---- modularity: 0.86 16 unique labels: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] ---- Ran partition in 29.84 s ---- No annotated labels Neighbourhood Contribution (Lambda Parameter): 0.5 reduced_pc_18 reduced_pc_18_umap PCA dims to analyse: [18] ==================================================================================================== Setting up partitioner for (nbr decay = scaled_gaussian), Neighbourhood contribution = 0.5, PCA dimensions = 18) ==================================================================================================== ---- Ran find_nn in 57.42 s ---- Nearest-neighbour connectivity graph (dtype: int16, shape: (60099, 60099)) has 3004950 nonzero entries. (after computing shared NN) Allowing nearest neighbours only reduced the number of shared NN from 40323023 to 3002814. ---- Ran shared_nn in 3.11 s ---- -- Multiplying sNN connectivity by weights -- shared NN with distance-based weights graph (dtype: float64, shape: (60099, 60099)) has 2925134 nonzero entries. shared NN weighted graph data: [0.17811225 0.17856195 0.17867833 ... 0.21758902 0.23808041 0.24517302] Converting graph (dtype: float64, shape: (60099, 60099)) has 2925134 nonzero entries. ---- Ran csr_to_igraph in 1.14 s ---- Resolution: 0.3 ------------------------------ ---- Partitioned BANKSY graph ---- modularity: 0.85 14 unique labels: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13] ---- Ran partition in 55.61 s ---- No annotated labels Neighbourhood Contribution (Lambda Parameter): 0.75 reduced_pc_18 reduced_pc_18_umap PCA dims to analyse: [18] ==================================================================================================== Setting up partitioner for (nbr decay = scaled_gaussian), Neighbourhood contribution = 0.75, PCA dimensions = 18) ==================================================================================================== ---- Ran find_nn in 47.89 s ---- Nearest-neighbour connectivity graph (dtype: int16, shape: (60099, 60099)) has 3004950 nonzero entries. (after computing shared NN) Allowing nearest neighbours only reduced the number of shared NN from 37926963 to 3003204. ---- Ran shared_nn in 2.91 s ---- -- Multiplying sNN connectivity by weights -- shared NN with distance-based weights graph (dtype: float64, shape: (60099, 60099)) has 2934271 nonzero entries. shared NN weighted graph data: [0.17896412 0.18245182 0.18330523 ... 0.20898813 0.24214146 0.24642361] Converting graph (dtype: float64, shape: (60099, 60099)) has 2934271 nonzero entries. ---- Ran csr_to_igraph in 1.04 s ---- Resolution: 0.3 ------------------------------ ---- Partitioned BANKSY graph ---- modularity: 0.86 14 unique labels: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13] ---- Ran partition in 42.84 s ---- No annotated labels Neighbourhood Contribution (Lambda Parameter): 1.0 reduced_pc_18 reduced_pc_18_umap PCA dims to analyse: [18] ==================================================================================================== Setting up partitioner for (nbr decay = scaled_gaussian), Neighbourhood contribution = 1.0, PCA dimensions = 18) ==================================================================================================== ---- Ran find_nn in 46.31 s ---- Nearest-neighbour connectivity graph (dtype: int16, shape: (60099, 60099)) has 3004950 nonzero entries. (after computing shared NN) Allowing nearest neighbours only reduced the number of shared NN from 37956991 to 3003371. ---- Ran shared_nn in 2.91 s ---- -- Multiplying sNN connectivity by weights -- shared NN with distance-based weights graph (dtype: float64, shape: (60099, 60099)) has 2940887 nonzero entries. shared NN weighted graph data: [0.18180313 0.18261261 0.18276584 ... 0.24294105 0.34535196 0.36112631] Converting graph (dtype: float64, shape: (60099, 60099)) has 2940887 nonzero entries. ---- Ran csr_to_igraph in 1.04 s ---- Resolution: 0.3 ------------------------------ ---- Partitioned BANKSY graph ---- modularity: 0.86 14 unique labels: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13] ---- Ran partition in 35.33 s ---- No annotated labels Decay type: nonspatial Neighbourhood Contribution (Lambda Parameter): 0.0 reduced_pc_18 reduced_pc_18_umap PCA dims to analyse: [18] ==================================================================================================== Setting up partitioner for (nbr decay = nonspatial), Neighbourhood contribution = 0.0, PCA dimensions = 18) ==================================================================================================== ---- Ran find_nn in 60.67 s ---- Nearest-neighbour connectivity graph (dtype: int16, shape: (60099, 60099)) has 3004950 nonzero entries. (after computing shared NN) Allowing nearest neighbours only reduced the number of shared NN from 52583273 to 3001823. ---- Ran shared_nn in 4.10 s ---- -- Multiplying sNN connectivity by weights -- shared NN with distance-based weights graph (dtype: float64, shape: (60099, 60099)) has 2890583 nonzero entries. shared NN weighted graph data: [0.15518063 0.15640196 0.15681363 ... 0.27790705 0.28128984 0.29219516] Converting graph (dtype: float64, shape: (60099, 60099)) has 2890583 nonzero entries. ---- Ran csr_to_igraph in 1.05 s ---- Resolution: 0.3 ------------------------------ ---- Partitioned BANKSY graph ---- modularity: 0.85 14 unique labels: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13] ---- Ran partition in 31.90 s ---- No annotated labels After sorting Dataframe Shape of dataframe: (5, 7) Maximum number of labels = 16 Indices of sorted list: [0 2 3 4 1] Expanding labels with ids: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13] so that ids range from 0 to 15 Label ids zerod: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13]. 0 to be inserted between each id: [0 0 0 0 0 0 0 0 0 0 0 0 0 0] 2 extra rows to be randomly inserted: [0 0 0 0 0 0 0 1 0 0 0 0 1] New ids: [ 0 1 2 3 4 5 6 7 9 10 11 12 13 15] ---- Ran expand_labels in 0.00 s ---- ---- Ran match_labels in 0.00 s ---- ---- Ran match_labels in 0.00 s ---- ---- Ran match_labels in 0.00 s ---- ---- Ran match_labels in 0.00 s ---- Matched Labels
/home/shamini//data/projects/spatial_workshop/Banksy_py/banksy/labels.py:398: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]` expand_labels(label_list[sort_indices[0]], /home/shamini//data/projects/spatial_workshop/Banksy_py/banksy/labels.py:414: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]` current_label = label_list[index]
| decay | lambda_param | num_pcs | resolution | num_labels | labels | adata | relabeled | |
|---|---|---|---|---|---|---|---|---|
| nonspatial_pc18_nc0.00_r0.30 | nonspatial | 0.00 | 18 | 0.3 | 14 | Label object:\nNumber of labels: 14, number of... | [[[View of AnnData object with n_obs × n_vars ... | Label object:\nNumber of labels: 14, number of... |
| scaled_gaussian_pc18_nc0.25_r0.30 | scaled_gaussian | 0.25 | 18 | 0.3 | 16 | Label object:\nNumber of labels: 16, number of... | [[[View of AnnData object with n_obs × n_vars ... | Label object:\nNumber of labels: 16, number of... |
| scaled_gaussian_pc18_nc0.50_r0.30 | scaled_gaussian | 0.50 | 18 | 0.3 | 14 | Label object:\nNumber of labels: 14, number of... | [[[View of AnnData object with n_obs × n_vars ... | Label object:\nNumber of labels: 14, number of... |
| scaled_gaussian_pc18_nc0.75_r0.30 | scaled_gaussian | 0.75 | 18 | 0.3 | 14 | Label object:\nNumber of labels: 14, number of... | [[[View of AnnData object with n_obs × n_vars ... | Label object:\nNumber of labels: 14, number of... |
| scaled_gaussian_pc18_nc1.00_r0.30 | scaled_gaussian | 1.00 | 18 | 0.3 | 14 | Label object:\nNumber of labels: 14, number of... | [[[View of AnnData object with n_obs × n_vars ... | Label object:\nNumber of labels: 14, number of... |
'Memory usage: 5505.18 MB'
TUTORIAL LECTURE
STEP 4 :
### Save the results to appropriates slots back to the priginal adata object
p_names = results_df.index
for p_name in p_names:
labels = results_df.loc[p_name, 'relabeled']
adata_results = results_df.loc[p_name, "adata"]
adata_results
#pc_temp = adata_results.obsm(f"reduced_pc {pca_dims[0]}")
#pca_umap = adata_results.obsm(f"umap {pca_dims[0]}")
label_name = f"labels_{p_name}"
label_name
print(label_name)
adata_results.obs[label_name] = np.char.mod('%d', labels.dense)
adata_results.obs[label_name] = adata_results.obs[label_name].astype('category')
adata.obs = adata.obs.reindex(adata_results.obs.index)
adata.obs[label_name] = adata_results.obs[label_name]
adata.obsm['pc18_banksy'] = adata_results.obsm['reduced_pc_18'].copy()
adata.obsm['umap18_banksy'] = adata_results.obsm['reduced_pc_18_umap'].copy()
labels_nonspatial_pc18_nc0.00_r0.30 labels_scaled_gaussian_pc18_nc0.25_r0.30 labels_scaled_gaussian_pc18_nc0.50_r0.30 labels_scaled_gaussian_pc18_nc0.75_r0.30 labels_scaled_gaussian_pc18_nc1.00_r0.30
PLOT AND DISCUSS
fig, ax = plt.subplots(figsize=(4, 4))
fig.suptitle('B2. Banksy clustering - labeled by cell-type', fontsize=16, x=0.5, y=1.05)
sc.pl.embedding(adata, basis='umap18_banksy', color='cell_label', show=False, ax=ax, legend_loc='on data', legend_fontsize=6)
fig, axes = plt.subplots(2, 2, figsize=(9, 9))
axes = axes.flatten()
fig.suptitle('B3. Banksy clustering - labeled by Banksy clustering', fontsize=16, x=0.3, y=0.95)
sc.pl.embedding(adata, basis='umap18_banksy',
color='labels_nonspatial_pc18_nc0.00_r0.30',
show=False, ax=axes[0],
legend_loc='on data', legend_fontsize=6)
sc.pl.embedding(adata, basis='umap18_banksy',
color='labels_scaled_gaussian_pc18_nc0.25_r0.30',
show=False, ax=axes[1],
legend_loc='on data', legend_fontsize=6)
sc.pl.embedding(adata, basis='umap18_banksy',
color='labels_scaled_gaussian_pc18_nc0.75_r0.30',
show=False, ax=axes[2],
legend_loc='on data', legend_fontsize=6)
sc.pl.embedding(adata, basis='umap18_banksy',
color='labels_scaled_gaussian_pc18_nc1.00_r0.30',
show=False, ax=axes[3],
legend_loc='on data', legend_fontsize=6)
<Axes: title={'center': 'labels_scaled_gaussian_pc18_nc1.00_r0.30'}, xlabel='umap18_banksy1', ylabel='umap18_banksy2'>
STOP FOR DISCUSSION/LECTURE
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.gridspec as gridspec
import matplotlib.lines as mlines
# Plot parameters
# ---------------
color = ppf.colors()
marker = 'o'
s = 1.5
markersize = 9
# ---------------
# Create the figure with GridSpec
fig = plt.figure(figsize=(24, 6.5))
fig.suptitle('B4. Comparison of cell labels, non-spatial and spatial clustering', fontsize=24, x=0.2, y=1.05)
gs = gridspec.GridSpec(1, 5, width_ratios=[1, 0.2, 1, 1, 0.2])
# First subplot for cell labels
ax0 = plt.subplot(gs[0])
sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='cell_label', palette=color, s=s, ax=ax0)
ax0.set_title('Cell Label', fontsize=15, loc='left')
ax0_legend = ax0.legend_ # Capture the legend object
ax0.get_legend().remove() # Remove the legend from the plot
# Empty subplot for cell label legend
ax1 = plt.subplot(gs[1])
ax1.axis('off')
handles_cell_labels = [mlines.Line2D([], [], color=legend_handle.get_color(), marker=marker, linestyle='', markersize=markersize)
for legend_handle in ax0_legend.legend_handles]
labels_cell_labels = [t.get_text() for t in ax0_legend.get_texts()]
ax1.legend(handles_cell_labels, labels_cell_labels, loc='center', ncol=1, fontsize=8, title='cell_label')
# Second subplot for non-spatial clustering
ax2 = plt.subplot(gs[2])
sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_nonspatial_pc18_nc0.00_r0.30', palette=color, s=s, ax=ax2)
ax2.set_title('Non-spatial clustering', fontsize=15, loc='left')
ax2_legend = ax2.legend_ # Capture the legend object
ax2.get_legend().remove() # Remove the legend from the plot
# Third subplot for spatial clustering
ax3 = plt.subplot(gs[3])
sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_scaled_gaussian_pc18_nc0.75_r0.30', palette=color, s=s, ax=ax3)
ax3.set_title('Spatial clustering', fontsize=15, loc='left')
ax3.get_legend().remove() # Remove the legend from the plot
# Empty subplot for cluster legend
ax4 = plt.subplot(gs[4])
ax4.axis('off')
handles_clusters = [mlines.Line2D([], [], color=legend_handle.get_color(), marker='o', linestyle='', markersize=markersize)
for legend_handle in ax2_legend.legend_handles]
labels_clusters = [t.get_text() for t in ax2_legend.get_texts()]
ax4.legend(handles_clusters, labels_clusters, loc='center', ncol=1, fontsize=9, title='clusters')
# Adjust the space between the subplots
plt.subplots_adjust(wspace=0.01)
sns.despine()
plt.tight_layout()
#----------------------
ppf.get_memory_usage() ### monitor memory usage
/tmp/ipykernel_323931/3399721562.py:22: UserWarning: The palette list has more values (23) than needed (22), which may not be intended. sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='cell_label', palette=color, s=s, ax=ax0) /tmp/ipykernel_323931/3399721562.py:37: UserWarning: The palette list has more values (23) than needed (14), which may not be intended. sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_nonspatial_pc18_nc0.00_r0.30', palette=color, s=s, ax=ax2) /tmp/ipykernel_323931/3399721562.py:44: UserWarning: The palette list has more values (23) than needed (14), which may not be intended. sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_scaled_gaussian_pc18_nc0.75_r0.30', palette=color, s=s, ax=ax3)
'Memory usage: 5505.18 MB'
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.gridspec as gridspec
import matplotlib.lines as mlines
# Plot parameters
# ---------------
# Create the figure with GridSpec
fig = plt.figure(figsize=(30, 15))
fig.suptitle('B5. Degree of cluster resolution on collective grouping of cell-types', fontsize=27, x=0.2, y=1.05)
color = ppf.colors()
marker = 'o'
s = 2.7
markersize = 12
# Create a main GridSpec with two rows and one column
gs = gridspec.GridSpec(2, 1, height_ratios=[1, 1])
# Create a GridSpec for the top row with three equal width columns
top_gs = gridspec.GridSpecFromSubplotSpec(1, 3, subplot_spec=gs[0], width_ratios=[1, 1, 1])
# Create a GridSpec for the bottom row with specified width ratios
bottom_gs = gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=gs[1], width_ratios=[1, 1, 0.2, 0.8])
# ---------------
# First subplot for cell labels
ax0 = plt.subplot(top_gs[0])
sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_nonspatial_pc18_nc0.00_r0.30', palette=color, s=s, ax=ax0)
ax0.set_title('Non-spatial clustering', fontsize=18, loc='left')
ax0_legend = ax0.legend_ # Capture the legend object
ax0.get_legend().remove() # Remove the legend from the plot
# Second subplot for non-spatial clustering
ax1 = plt.subplot(top_gs[1])
sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_scaled_gaussian_pc18_nc0.25_r0.30', palette=color, s=s, ax=ax1)
ax1.set_title('Spatial clustering - 0.25', fontsize=18, loc='left')
ax1.get_legend().remove() # Remove the legend from the plot
## Third subplot for spatial clustering
ax2 = plt.subplot(top_gs[2])
sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_scaled_gaussian_pc18_nc0.50_r0.30', palette=color, s=s, ax=ax2)
ax2.set_title('Spatial clustering - 0.50', fontsize=18, loc='left')
ax2.get_legend().remove() # Remove the legend from the plot
# Third subplot for spatial clustering
ax3 = plt.subplot(bottom_gs[0])
sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_scaled_gaussian_pc18_nc0.75_r0.30', palette=color, s=s, ax=ax3)
ax3.set_title('Spatial clustering - 0.75', fontsize=18, loc='left')
ax3.get_legend().remove() # Remove the legend from the plot
# Third subplot for spatial clustering
ax4 = plt.subplot(bottom_gs[1])
sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_scaled_gaussian_pc18_nc1.00_r0.30', palette=color, s=s, ax=ax4)
ax4.set_title('Spatial clustering - 1.00', fontsize=18, loc='left')
ax4.get_legend().remove() # Remove the legend from the plot
# Empty subplot for cluster legend
ax5 = plt.subplot(bottom_gs[2])
ax5.axis('off')
handles_clusters = [mlines.Line2D([], [], color=legend_handle.get_color(), marker='o', linestyle='', markersize=markersize)
for legend_handle in ax0_legend.legend_handles]
labels_clusters = [t.get_text() for t in ax0_legend.get_texts()]
ax5.legend(handles_clusters, labels_clusters, loc='center', ncol=1, fontsize=8, title='clusters')
# Adjust the space between the subplots
plt.subplots_adjust(wspace=0.01)
sns.despine()
plt.tight_layout()
#----------------------
ppf.get_memory_usage() ### monitor memory usage
/tmp/ipykernel_323931/3714081611.py:31: UserWarning: The palette list has more values (23) than needed (14), which may not be intended. sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_nonspatial_pc18_nc0.00_r0.30', palette=color, s=s, ax=ax0) /tmp/ipykernel_323931/3714081611.py:39: UserWarning: The palette list has more values (23) than needed (16), which may not be intended. sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_scaled_gaussian_pc18_nc0.25_r0.30', palette=color, s=s, ax=ax1) /tmp/ipykernel_323931/3714081611.py:45: UserWarning: The palette list has more values (23) than needed (14), which may not be intended. sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_scaled_gaussian_pc18_nc0.50_r0.30', palette=color, s=s, ax=ax2) /tmp/ipykernel_323931/3714081611.py:52: UserWarning: The palette list has more values (23) than needed (14), which may not be intended. sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_scaled_gaussian_pc18_nc0.75_r0.30', palette=color, s=s, ax=ax3) /tmp/ipykernel_323931/3714081611.py:58: UserWarning: The palette list has more values (23) than needed (14), which may not be intended. sns.scatterplot(data=adata.obs, x='x_location', y='y_location', hue='labels_scaled_gaussian_pc18_nc1.00_r0.30', palette=color, s=s, ax=ax4)
'Memory usage: 5505.18 MB'
STOP FOR DISCUSSION/LECTURE
import matplotlib.gridspec as gridspec
# Create the figure with GridSpec
fig = plt.figure(figsize=(15, 4.5))
fig.suptitle('B6. Comparison of cell labels, non-spatial and spatial clustering', fontsize=16, x=0.2, y=1.05)
gs = gridspec.GridSpec(1, 3, width_ratios=[1, 1, 0.2])
# First plot
ax1 = fig.add_subplot(gs[0])
sns.histplot(data=adata.obs, hue='labels_nonspatial_pc18_nc0.00_r0.30', x='cell_label', palette=color, multiple='stack', shrink=0.8, ax=ax1)
ax1.get_legend().remove() # Remove the legend from the first plot
ax1.xaxis.set_tick_params(rotation=45)
ax1.set_title('non-spatial clustering of cell types', loc='left')
for label in ax1.get_xticklabels():
label.set_ha('right')
# Second plot
ax2 = fig.add_subplot(gs[1])
sns.histplot(data=adata.obs, hue='labels_scaled_gaussian_pc18_nc0.75_r0.30', x='cell_label', palette=color, multiple='stack', shrink=0.8, ax=ax2)
ax2.set_title('spatial clustering of cell-types', loc='left')
ax2_legend = ax2.legend_ # Get the legend from the second plot
ax2.get_legend().remove() # Remove the legend from the second plot
ax2.xaxis.set_tick_params(rotation=45)
for label in ax2.get_xticklabels():
label.set_ha('right')
# Add the legend to the figure in the third GridSpec cell
ax3 = fig.add_subplot(gs[2])
ax3.axis('off')
handles, labels = ax2_legend.legend_handles, [t.get_text() for t in ax2_legend.get_texts()]
fig.legend(handles, labels, loc='center right', ncol=1, fontsize=8, title='cell_label', bbox_to_anchor=(.985, 0.55))
sns.despine()
plt.tight_layout()
plt.show()
#----------------------
ppf.get_memory_usage() ### monitor memory usage
/tmp/ipykernel_323931/4088751005.py:11: UserWarning: The palette list has more values (23) than needed (14), which may not be intended. sns.histplot(data=adata.obs, hue='labels_nonspatial_pc18_nc0.00_r0.30', x='cell_label', palette=color, multiple='stack', shrink=0.8, ax=ax1) /tmp/ipykernel_323931/4088751005.py:20: UserWarning: The palette list has more values (23) than needed (14), which may not be intended. sns.histplot(data=adata.obs, hue='labels_scaled_gaussian_pc18_nc0.75_r0.30', x='cell_label', palette=color, multiple='stack', shrink=0.8, ax=ax2)
'Memory usage: 5539.49 MB'
STOP FOR DISCUSSION/LECTURE
* TO-DO
adata.write_h5ad(out+'module3/TgCRND8_17_8mths/adata_module3b_banksy.h5ad') ### save the anndata object
REVIEW FOR MODULE 4
END OF MODULE 3 : NORMALIZATION, DIMENSIONAL REDUCTION
Thank you and see you in the next module where we will tackle segmentation